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Abstract 

The row (resp. column) rank profile of a matrix describes the stair¬ 
case shape of its row (resp. column) echelon form. In an ISSACT3 paper, 
we proposed a recursive Gaussian elimination that can compute simulta¬ 
neously the row and column rank profiles of a matrix as well as those of all 
of its leading sub-matrices, in the same time as state of the art Gaussian 
elimination algorithms. Here we first study the conditions making a Gaus¬ 
sian elimination algorithm reveal this information. Therefore, we propose 
the definition of a new matrix invariant, the rank profile matrix, summa¬ 
rizing all information on the row and column rank profiles of all the leading 
sub-matrices. We also explore the conditions for a Gaussian elimination 
algorithm to compute all or part of this invariant, through the correspond¬ 
ing PLUQ decomposition. As a consequence, we show that the classical 
iterative GUP decomposition algorithm can actually be adapted to com¬ 
pute the rank profile matrix. Used, in a Grout variant, as a base-case 
to our ISSAGT3 implementation, it delivers a significant improvement in 
efficiency. Second, the row (resp. column) echelon form of a matrix are 
usually computed via different dedicated triangular decompositions. We 
show here that, from some PLUQ decompositions, it is possible to recover 
the row and column echelon forms of a matrix and of any of its leading 
sub-matrices thanks to an elementary post-processing algorithm. 


1 Introduction 

Triangular matrix decompositions are widely used in computational linear alge¬ 
bra. Besides solving linear systems of equations, they are also used to compute 
other objects more specific to exact arithmetic: computing the rank, sampling 
a vector from the null-space, computing echelon forms and rank profiles. 
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The row rank profile (resp. column rank profile) of an m x n matrix A with 
rank r, denoted by RowRP(A) (resp. ColRP(A)), is the lexicographically small¬ 
est sequence of r indices of linearly independent rows (resp. columns) of A. An 
mxn matrix has generic row (resp. column) rank profile if its row (resp. column) 
rank profile is (1, ..,r). Lastly, an m x n matrix has generic rank profile if its 
r first leading principal minors are non-zero. Note that if a matrix has generic 
rank profile, then its row and column rank profiles are generic, but the converse 


is false: the matrix 


0 1 
1 0 


does not have generic rank profile even if its row and 


column rank profiles are generic. The row support (resp. column support) of a 
matrix A, denoted by RowSupp(A) (resp. ColSupp(A)), is the subset of indices 
of its non-zero rows (resp. columns). 

We recall that the row echelon form of an m x n matrix A is an upper 
triangular matrix E = TA, for a non-singular matrix T, with the zero rows of 
E at the bottom and the non-zero rows in stair-case shape: min{_) : aij ^ 0} < 
min{_) : 0}. As T is non singular, the column rank profile of A is that 

of E, and therefore corresponds to the column indices of the leading elements in 
the staircase. Similarly the row rank profile of A is composed of the row indices 
of the leading elements in the staircase of the column echelon form of A. 


Rank profile and triangular matrix decompositions The rank profiles 
of a matrix and the triangular matrix decomposition obtained by Gaussian 
elimination are strongly related. The elimination of matrices with arbitrary 
rank profiles gives rise to several matrix factorizations and many algorithmic 
variants. In numerical linear algebra one often uses the PLUQ decomposition, 
with P and Q permutation matrices, L a lower unit triangular matrix and U 
an upper triangular matrix. The LSP and LQUP variants of [8] are used to 
reduce the complexity rank deficient Gaussian elimination to that of matrix 
multiplication. Many other algorithmic decompositions exist allowing fraction 
free computations [10], in-place computations [4, 9] or sub-cubic rank-sensitive 
time complexity [13, 9]. In [5] we proposed a Gaussian elimination algorithm 
with a recursive splitting of both row and column dimensions, and replacing 
row and column transpositions by rotations. This elimination can compute 
simultaneously the row and column rank profile while preserving the sub-cubic 
rank-sensitive time complexity and keeping the computation in-place. 

In this paper we first study the conditions a PLUQ decomposition algorithm 
must satisfy in order to reveal the rank profile structure of a matrix. We in¬ 
troduce in section 2 the rank profile matrix TZa, a normal form summarizing 
all rank profile information of a matrix and of all its leading sub-matrices. We 
then decompose, in section 3, the pivoting strategy of any PLUQ algorithm into 
two types of operations: the search of the pivot and the permutation used to 
move it to the main diagonal. We propose a new search and a new permutation 
strategy and show what rank profiles are computed using any possible combina¬ 
tion of these operations and the previously used searches and permutations. In 
particular we show three new pivoting strategy combinations that compute the 
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rank profile matrix and use one of them, an iterative Grout CUP with rotations, 
to improve the base case and thus the overall performance of exact Gaussian 
elimination. Second, we show that preserving both the row and column rank 
profiles, together with ensuring a monotonicity of the associated permutations, 
allows us to compute faster several other matrix decompositions, such as the 
LEU and Bruhat decompositions, and echelon forms. 

In the following, Omxn denotes the m x n zero matrix and denotes 

the sub-matrix of A of rows between i and j and columns between k and 1. To 
a permutation a : {!,... ,n} —>■ {!,... ,n} we define the associated permutation 
matrix P^, permuting rows by left multiplication: the rows of P^A are that of 
A permuted by a. Reciprocally, for a permutation matrix P, we denote by crp 
the associated permutation. 


2 The rank profile matrix 


We start by introducing in Theorem 1 the rank profile matrix, that we will use 
throughout this document to summarize all information on the rank profiles of a 
matrix. From now on, matrices are over a field K and a valid pivot is a non-zero 
element. 


Definition 1. An r-sub-permutation matrix is a matrix of rank r with only r 
non-zero entries equal to one. 


Lemma 1. An mxn r-suh-permutation matrix has at most one non-zero entry 


per row and per column, and can be written P 
Q are permutation matrices. 


0 


(m—r) X (n—r) 


Q where P and 


Theorem 1. Let A € There exists a unique mxn rank{A)-sub- 

permutation matrix TZa of which every leading sub-matrix has the same rank 
as the corresponding leading sub-matrix of A. This sub-permutation matrix is 
called the rank profile matrix of A. 


Proof. We prove existence by induction on the row dimension of the leading 
submatrices. 

If ^i,i..n = Oixn, setting = Oixn satisfies the defining condition. Oth¬ 
erwise, let j be the index of the leftmost invertible element in and set 

= ej the j-th n-dimensional row canonical vector, which satisfies the defin¬ 
ing condition. 

Now for a given i G {!,...,m}, suppose that there is a unique i x n 
rank profile matrix TZ^'^'> such that rank(Ai.,i^i,,j) = rank(7?.i,,ip,,j) for every 

j G {l..n}. If rank(Ai„i+pi„„) = rank(Ti,,j 4 „„), then ^ . 0th- 

[Ulxn_ 

erwise, consider fc, the smallest column index such that rank(^i.,i+i^i,,fc) = 


rank(Ai,,i,i,.fc) + I and set = 


. Any leading sub-matrix of 


has the same rank as the corresponding leading sub-matrix of A: first, for any 
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leading subset of rows and columns with less than i rows, the case is covered by 

= ^1..2+1,1..fcj where u, v are vectors and x 


the induction; second define 


T 

V X 


is a scalar. From the definition of k, v is linearly dependent with B and thus any 

leading sub-matrix of ^ has the same rank as the corresponding sub-matrix 

of Similarly, from the definition of k, the same reasoning works when 

considering more than k columns, with a rank increment by 1. 

Lastly we show that is a r^+i-sub-permutation matrix. Indeed, u is lin¬ 

early dependent with the columns of B: otherwise, rank( [i? rt]) = rank(i?) -|- 1. 

[B u] 

From the definition of k we then have rank( rp ) = rank{[B u]) -k 1 = 


rank(i?) + 2 = rank( ) + 2 which is a contradiction. Consequently, the fc-th 

column of is all zero, and is a r-sub-permutation matrix. 

To prove uniqueness, suppose there exist two distinct rank profile matrices 
7^(l) and 7^(2) for a given matrix A and let {i,j) be some coordinates where 

^ = ^u.Li.i..i-i- Then, rank(>li,,,,i„j) = 

rank(7?.r^y j) yf rank(7?.r^y ^■) = rank(kli,.i^i..j) which is a contradiction. □ 


Example 1. A = 


'2 0 3 O' 


'1 0 0 0' 

10 0 0 

has TZa = 

0 0 10 

0 0 4 0 

0 0 0 0 

0 2 0 1 


0 10 0 


for rank profile matrix over 


Remark 1. The matrix E introduced in Malaschonok’s LEU decomposition [12, 
Theorem 1], is in fact the rank profile matrix. There, the existence of this 
decomposition was only shown for m = n = 2^, and no connection was made 
to the relation with ranks and rank profiles. This connection was made in [5, 
Corollary 1], and the existence of E generalized to arbitrary dimensions m and 
n. Finally, after proving its uniqueness here, we propose this definition as a 
new matrix normal form. 

The rank profile matrix has the following properties: 

Lemma 2. Let A he a matrix. 

1. TZa is diagonal if A has generic rank profile. 

2. TZa is a permutation matrix if A is invertible 

3. RowRP{A) = RowSupp{TZa); ColRP{A) = CoISupp{TZa) . 

Moreover, for all 1 < i < m and 1 < j < we have: 

4 . RowRP{Ai i i j) = RowSupp{{TZA)i..iA..j) 

5. ColRP{Ai,,i,i..j) = ColSupp{(TZA)i..i,i..j), 

These properties show how to recover the row and column rank profiles of 
A and of any of its leading sub-matrix. 
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3 Ingredients of a PLUQ decomposition algo¬ 
rithm 


Over a field, the LU decomposition generalizes to matrices with arbitrary rank 
profiles, using row and column permutations (in some cases such as the CUP, 
or LSP decompositions, the row permutation is embedded in the structure of 
the C or S matrices). However such PLUQ decompositions are not unique and 
not all of them will necessarily reveal rank profiles and echelon forms. We will 
characterize the conditions for a PLUQ decomposition algorithm to reveal the 
row or column rank profile or the rank profile matrix. 

We consider the four types of operations of a Gaussian elimination algorithm 
in the processing of the A:-th pivot: 

Pivot search: finding an element to be used as a pivot. 

Pivot permutation: moving the pivot in diagonal position (fc, k) by column 
and/or row permutations. 

Update: applying the elimination at position 

Normalization: dividing the k-th. row (resp. column) by the pivot. 

Choosing how each of these operation is done, and when they are scheduled 
results in an elimination algorithm. Conversely, any Gaussian elimination algo¬ 
rithm computing a PLUQ decomposition can be viewed as a set of specializations 
of each of these operations together with a scheduling. 

The choice of doing the normalization on rows or columns only determines 
which of [/ or L will be unit triangular. The scheduling of the updates vary 
depending on the type of algorithm used: iterative, recursive, slab or tiled block 
splitting, with right-looking, left-looking or Grout variants [2]. Neither the 
normalization nor the update impact the capacity to reveal rank profiles and 
we will thus now focus on the pivot search and the permutations. 

Choosing a search and a permutation strategy fixes the matrices P and Q 
of the PLUQ decomposition obtained and, as we will see, determines the ability 
to recover information on the rank profiles. Once these matrices are fixed, the 
L and the U factors are unique. We introduce the pivoting matrix. 

Definition 2. The pivoting matrix of a PLUQ decomposition A = PLUQ of 
rank r is the r-suh-permutation matrix 


np,Q = P 


0 


(m—r) X (n—r) 


Q. 


The r non-zero elements of Hp q are located at the initial positions of the 
pivots in the matrix A. Thus Hp^g summarizes the choices made in the search 
and permutation operations. 


Pivot search The search operation vastly differs depending on the field of 
application. In numerical dense linear algebra, numerical stability is the main 
criterion for the selection of the pivot. In sparse linear algebra, the pivot is 
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chosen so as to reduce the fill-in produced by the update operation. In order to 
reveal some information on the rank profiles, a notion of precedence has to be 
used: a usual way to compute the row rank profile is to search in a given row 
for a pivot and only move to the next row if the current row was found to be all 
zeros. This guarantees that each pivot will be on the first linearly independent 
row, and therefore the row support of ^p,q will be the row rank profile. The 
precedence here is that the pivot’s coordinates must minimize the order for the 
first coordinate (the row index). As a generalization, we consider the most 
common preorders of the cartesian product {!,... m} x {1,... n} inherited from 
the natural orders of each of its components and describe the corresponding 
search strategies, minimizing this preorder: 

Row order: ^row (*2, j2) iff *i < *2: search for any invertible element 

in the first non-zero row. 

Column order: (fi, ji) ^coi (*2, J2) iff Ji < J2- search for any invertible ele¬ 
ment in the first non-zero column. 

Lexicographic order: (ii, ji) ^lex (* 2 ,j 2 ) iff h < *2 or ii = i^ and ji < J 2 : 

search for the leftmost non-zero element of the first non-zero row. 
Reverse lexicographic order: (zi,ji) ^reviex (*2,j2) iff ji < J2 or ji = ^2 
and ii < ii'. search for the topmost non-zero element of the first non-zero 
column. 

Product order: ^prod (*2,^2) iff ii < *2 and ji < ji'. search for any 

non-zero element at position {i,j) being the only non-zero of the leading 
{i,j) sub-matrix. 


0 0 0 a 6 
0 c d e f 
g h i j k 
I m n o p 

element. The minimum non-zero elements for each preorder are the following: 


Example 2. Consider the matrix 


, where each literal is a non-zero 


Row order 

a, b 

Column order 

9,1 

Lexicographic order 

a 

Reverse lexic. order 

9 

Product order 

a,c,g 


Pivot permutation The pivot permutation moves a pivot from its initial 
position to the leading diagonal. Besides this constraint all possible choices 
are left for the remaining values of the permutation. Most often, it is done 
by row or column transpositions, as it clearly involves a small amount of data 
movement. However, these transpositions can break the precedence relations in 
the set of rows or columns, and can therefore prevent the recovery of the rank 
profile information. A pivot permutation that leaves the precedence relations 
unchanged will be called /c-monotonically increasing. 

Definition 3. A permutation of a G Sn is called k-monotonically increasing if 
its last n — k values form a monotonically increasing sequence. 
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In particular, the last n — k rows of the associated row-permutation matrix 
Per are in row echelon form. For example, the cyclic shift between indices k and 
i, with k < i defined as Rk,i = ( 1 ,..., fc — l,i,k,k + 1,... ,i — l,i + 1,... ,n), 
that we will call a (fc, t)-rotation, is an elementary fc-monotonically increasing 
permutation. 


Example 3. The (1,4)-rotation Pi 4 = (4,1,2,3) 
ing permutation. Its row permutation matrix is 


is a 1-monotonically increas- 


0 1 
1 

1 


In fact, any {k, i)- 


1 0 

rotation is a k-monotonically increasing permutation. 


Monotonically increasing permutations can be composed as stated in Lemma 3. 


Lemma 3. If ai G Sn is a ki-monotonically increasing permutation and 02 G 
Ski ^ ^n-ki a k 2 -monotonically increasing permutation with ki < ^2 then the 
permutation (T 2 o cti is a k 2 -monotonically increasing permutation. 


Proof. The last n — /c 2 values of 172 o cri are the image of a sub-sequence of n — ^2 
values from the last n — ki values of (Ti through the monotonically increasing 
function (T 2 - D 


Therefore an iterative algorithm, using rotations as elementary pivot per¬ 
mutations, maintains the property that the permutation matrices P and Q at 
any step k are fc-monotonically increasing. A similar property also applies with 
recursive algorithms. 


4 How to reveal rank profiles 


A PLUQ decomposition reveals the row (resp. column) rank profile if it can be 
read from the first r values of the permutation matrix P (resp. Q). Equivalently, 
by Lemma 2, this means that the row (resp. column) support of the pivoting 
matrix Hp.g equals that of the rank profile matrix. 


Definition 4. The decomposition A = PLUQ reveals: 

1. the row rank profile if RowSupp{Jlp^Q) = RowSupp{1Za), 

2 . the col. rank profile if ColSuppfUp^Q) = CoISupp{1Za), 

3. the rank profile matrix ifllpQ = R-a- 


Example 4. A = 


2 0 3 0 


10 0 0 

10 0 0 

has TZa = 

0 0 10 

0 0 4 0 

0 0 0 0 

0 2 0 1 


0 10 0 


for rank profile matrix over 


Q. Now the pivoting matrix obtained from a PLUQ decomposition with a pivot 
search operation following the row order (any column, first non-zero row) could 
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he the matrix lip^q 


. As these matrices share the same row support, 


0 0 10 
10 0 0 
0 0 0 0 
^0 10 0 ^ 

the matrix IIp.Q reveals the row rank profile of A. 


Remark 2. Example 4, suggests that a pivot search strategy minimizing row 
and column indices could be a sufficient condition to recover both row and col¬ 
umn rank profiles at the same time, regardless the pivot permutation. However, 
this is unfortunately not the case. Consider for example a search based on the 
lexicographic order (first non-zero column of the first non-zero row) with trans¬ 


position permutations, run on the matrix: A 


0 0 1 
2 3 0 


Its rank profile matrix 


is Ha 


0 0 1 
10 0 


whereas the pivoting matrix could he Ilp^g 


0 0 1 

0 10 ’ 


which 


does not reveal the column rank profile. This is due to the fact that the col¬ 
umn transposition performed for the first pivot changes the order in which the 
columns will be inspected in the search for the second pivot. 


We will show that if the pivot permutations preserve the order in which 
the still unprocessed columns or rows appear, then the pivoting matrix will 
equal the rank profile matrix. This is achieved by the monotonically increasing 
permutations. 

Theorem 2 shows how the ability of a PLUQ decomposition algorithm to 
recover the rank profile information relates to the use of monotonically increas¬ 
ing permutations. More precisely, it considers an arbitrary step in a PLUQ 
decomposition where k pivots have been found in the elimination of an £ x p 
leading sub-matrix Ai of the input matrix A. 


Theorem 2. Consider a partial PLUQ decomposition of an m x n matrix A: 


A = Fi 


'Ll 

'Ui Vi' 

_Mi Im-k_ 

H 


where 


Li 

Ml 


is mxk lower triangular and [Ui Vi\ is kxn upper triangular, and 

let Ai be some £ x p leading sub-matrix of A, for £,p > k. Let H = P 2 L 2 U 2 Q 2 
be a PLUQ decomposition of H. Consider the PLUQ decomposition 


A = P, 


'Ik 

Li 

'Ui ViQl 

'Ik 

Pa 

Pi Ml L 2 


Q 2 


Qi 


Consider the following clauses: 

(i) RowRP{Ai) = RowSupp{Jlp.,^^Qf) 

(ii) ColRP{Ai) = ColSuppilip^^qf) 

(Hi) TZai = IIpj^Qi 

(iv) RowRP{H) = RowSupp(Jlpr^^q.f) 
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(v) ColRP{H) = ColSuppiJlp^^Q^) 

(vi) TZh = ^P 2 ,Q 2 

(vii) Pi is k-monotonically increasing or (P^ is i-monotonically increasing 
and p = n) 

(via) Qf is k-monotonieally increasing or (Qf is p-monotonically increasing 
and £ = m) 


Then, 

(a) if (i) or (ii) or (Hi) then H = * 

(b) if (vii) then ((i) and (iv)) ^ RowRP{A) = RowSuppfUp^Q); 

(c) if (via) then ((ii) and (v)) => ColRP{A) = ColSuppfllp^Q); 

(d) if (vii) and (viii) then (Hi) and (vi) => TZa = Apq. 


Proof. Let Pi = [Pn Pi] and Qi = 
(n — k) X n. On one hand we have 


Qii 

Pi 


where Pi is m x (m 


k) and Pi is 


A = [Pii Pi] 


Li 

Ml 


[Ui ^i] 


Qii 

Pi 


+P 1 PP 1 . 


On the other hand, 
np,Q = Pi 

= Pi 


■4 

■4 

■4 

Pa 

0(m—r) X (n—r) 

<52 


n 


0,(32 


Qi 


Ql = + PlIIp^^Q^Pl. 


( 1 ) 


Ai 


0 


0 0 


{ m — t ) X (n—p) 


and denote by Pi the £xp leading sub-matrix 


Let Ai = 
of P. 

(a) The clause (i) or (ii) or (iii) implies that all k pivots of the partial elimination 
were found within the £ x p sub-matrix Ai. Hence rank (Hi) = k and we 


can write Pi = 
Ai writes 


Pi 

0 ( rr).— 


X k 


Pi 


and Ql = 


Qll ^ kx { n — p ) 

Pi 


Hi = [/, 0] H 


0 


= Bi+ [h 0] EiHFi 


0 


, and the matrix 

( 2 ) 


Now rank(Pi) = fc as a sub-matrix of P of rank k and since 


Pi 


[Pii [h 0] • Pi] 


T 


Qll 

Ml 

[Ui Vi] 

El- 

Up] 

oj 


PiiLiUiQii + [Q 0] EiMi [Ui th] Ql 


Ip 

0 
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where the first term, PuLiUiQn, has rank k and the second term has a 
disjoint row support. 


Finally, consider the term [ig O] EiHFi 


0 


of equation (2). As its row 


support is disjoint with that of the pivot rows of Bi, it has to be composed of 
rows linearly dependent with the pivot rows of Bi to ensure that rank(Ai) = 
k. As its column support is disjoint with that of the pivot columns of 
Bi, we conclude that it must be the zero matrix. Therefore the leading 
{i — k) X {p — k) sub-matrix of EiHFi is zero. 

(b) From (a) we know that Ai = Bi. Thus RowRP(i3) = RowRP(Ai). Recall 
that A = B + EiHFi. No pivot row of B can be made linearly dependent 
by adding rows of EiHFi, as the column position of the pivot is always 
zero in the latter matrix. For the same reason, no pivot row of EiHFi can 
be made linearly dependent by adding rows of B. From (i), the set of pivot 
rows of B is RowRP(Ai), which shows that 


RowRP(A) = RowRP(Ai) U RowRP(F;iiJAi). (3) 


Let '■ fc} —>■ {1 ..to} be the map representing the sub-permutation 

El (i.e. such that Ei[aEi{i)A] = 1 Vi). If Pi is fc-monotonically increasing, 
the matrix Ei has full column rank and is in column echelon form, which 
implies that 


RowRP{EiHFi) = aE,{RowRP{HFi)) 

= (TB,(RowRP(iL)), (4) 


since Fi has full row rank. If P^ is i monotonically increasing, we can write 
El = [Ell Ei 2 \ , where the m x (to — £) matrix E 12 is in column echelon 


form. If p = n, the matrix iJ writes H = 


X (n—k) 

H2 


Hence we have 


EiHFi = E 12 H 2 F 1 which also implies 


RowRR{EiHFi) = crE,{RowRP{H)). 

From equation (2), the row support of Hp^g is that of Hp^^g^ -l-Ainp^^g^Fi, 
which is the union of the row support of these two terms as they are dis¬ 
joint. Under the conditions of point (b), this row support is the union of 
RowRP(Ai) and crp^(RowRP(iL)), which is, from (4) and (3), RowRP(A). 

(c) Similarly as for point (b). 

(d) From (a) we have still Ai = Bi. Now since rank(i?) = rank(i3i) = 
rank(Ai) = fc, there is no other non-zero element in TZb than those in 

and TZb = The row and column support oi TZb and that of 

EiHFi are disjoint. Hence 


E-a = + TZeiHFi ■ (5) 

If both P^ and Qf are fc-monotonically increasing, the matrix Ei is in 
column echelon form and the matrix Fi in row echelon form. Consequently, 
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the matrix EiHFi is a copy of the matrix H with k zero-rows and k zero- 
columns interleaved, which does not impact the linear dependency relations 
between the non-zero rows and columns. As a consequence 


Ti-EiHFi = EiTZhFi. ( 6 ) 

Now if is fc-monotonically increasing, is Amonotonically increasing 
and p = n, then, using notations of point (b), EiHFi = E 12 H 2 F 1 where E 12 
is in column echelon form. Thus TZeiHFi = EiTZhFi for the same reason. 
The symmetric case where Qf is p-monotonically increasing and £ = m 
works similarly. Combining equations (2), (5) and (6) gives TZa = flp^g. q 


5 Algorithms for rank profiles 

Using Theorem 2, we deduce what rank profile information is revealed by a 
PLUQ algorithm by the way the Search and the Permutation operations are 
done. Table 1 summarizes these results, and points to instances known in the 
literature, implementing the corresponding type of elimination. More precisely, 
we first distinguish in this table the ability to compute the row or column rank 
profile or the rank profile matrix, but we also indicate whether the resulting 
PLUQ decomposition preserves the monotonicity of the rows or columns. Indeed 
some algorithm may compute the rank profile matrix, but break the precedence 
relation between the linearly dependent rows or columns, making it unusable as 
a base case for a block algorithm of higher level. 


Search 

Row Perm. 

Col. Perm. 

Reveals 

Monotonicity 

Instance 

Row order 

Transposition 

Transposition 

RowRP 


[8, 9] 

Col. order 

Transposition 

Transposition 

ColRP 


[11, 9] 


Transposition 

Transposition 

RowRP 


[13] 

Lexicographic 

Transposition 

Rotation 

RowRP, ColRP, TZ 

Col. 

here 


Rotation 

Rotation 

RowRP, ColRP, TZ 

Row, Col. 

here 


Transposition 

Transposition 

ColRP 


[13] 

Rev. lexico. 

Rotation 

Transposition 

RowRP, ColRP, TZ 

Row 

here 


Rotation 

Rotation 

RowRP, ColRP, TZ 

Row, Col. 

here 


Rotation 

Transposition 

RowRP 

Row 

here 

Product 

Transposition 

Rotation 

ColRP 

Col 

here 


Rotation 

Rotation 

RowRP, ColRP, TZ 

Row, Col. 

[5] 


Table 1: Pivoting Strategies revealing rank profiles 

5.1 Iterative algorithms 

We start with iterative algorithms, where each iteration handles one pivot at 
a time. Here Theorem 2 is applied with k = 1, and the partial elimination 
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represents how one pivot is being treated. The elimination of H is done by 
induction. 

Row and Column order Search The row order pivot search operation is of 
the form: any non-zero element in the first non-zero row. Each row is inspected 
in order, and a new row is considered only when the previous row is all zeros. 
With the notations of Theorem 2, this means that Ai is the leading i x n 
sub-matrix of A, where i is the index of the first non-zero row of A. When 
permutations P\ and Qi, moving the pivot from position {£,j) to (fc, fc) are 
transpositions, the matrix is the element of the canonical basis. Its 

row rank profile is {£) which is that of the £xn leading sub-matrix Ai. Finally, 
the permutation Pi is .^-monotonically increasing, and Theorem 2 case (b) can 
be applied to prove by induction that any such algorithm will reveal the row 
rank profile: RowRP(^) = RowSupp(np^Q). The case of the column order 
search is similar. 

Lexicographic order based pivot search In this case the Pivot Search 
operation is of the form: first non-zero element in the first non-zero row. The 
lexicographic order being compatible with the row order, the above results hold 
when transpositions are used and the row rank profile is revealed. If in addition 
column rotations are used, Qi = Rij which is 1-monotonically increasing. Now 
IIpi^Qi = Egj which is the rank profile matrix of the £ x n leading sub-matrix 
Ai of A. Theorem 2 case (d) can be applied to prove by induction that any 
such algorithm will reveal the rank profile matrix: TZa = IIp^Q. Lastly, the use 
of row rotations, ensures that the order of the linearly dependent rows will be 
preserved as well. Algorithm 1 is an instance of Gaussian elimination with a 
lexicographic order search and rotations for row and column permutations. 

The case of the reverse lexicographic order search is similar. As an example, 
the algorithm in [13, Algorithm 2.14] is based on a reverse lexicographic order 
search but with transpositions for the row permutations. Hence it only reveals 
the column rank profile. 

Product order based pivot search The search here consists in finding any 
non-zero element Ae^p such that the £ xp leading sub-matrix Ai of A is all zeros 
except this coefficient. If the row and column permutations are the rotations 
Ri i and we have Hp,^ = E^ p = TZai ■ Theorem 2 case (d) can be applied 
to prove by induction that any such algorithm will reveal the rank profile matrix: 
TZa = Ilp g. An instance of such an algorithm is given in [5, Algorithm 2]. If Pi 
(resp. Qi) is a transposition, then Theorem 2 case (c) (resp. case (b)) applies 
to show by induction that the columns (resp. row) rank profile is revealed. 

5.2 Recursive algorithms 

A recursive Gaussian elimination algorithm can either split one of the row or 
column dimension, cutting the matrix in wide or tall rectangular slabs, or split 
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both dimensions, leading to a decomposition into tiles. 


Slab recursive algorihtms Most algorithms computing rank profiles are slab 
recursive [8, 11, 13, 9]. When the row dimension is split, this means that the 
search space for pivots is the whole set of columns, and Theorem 2 applies with 
p = n. This corresponds to a either a row or a lexicographic order. From 
case( b), one shows that, with transpositions, the algorithm recovers the row 
rank profile, provided that the base case does. If in addition, the elementary 
column permutations are rotations, then case (d) applies and the rank profile 
matrix is recovered. Finally, if rows are also permuted by monotonically increas¬ 
ing permutations, then the PLUQ decomposition also respects the monotonicity 
of the linearly dependent rows and columns. The same reasoning holds when 
splitting the column dimension. 


writes H = 


Tile recursive algorithms Tile recursive Gaussian elimination algorithms [5, 
12 , 6] are more involved, especially when dealing with rank deficiencies, and we 
refer to [5] for a detailed description of such an algorithm. Here, the search 
area Ai has arbitrary dimensions £ x p, often specialized as m/2 x n/2. As a 
consequence, the pivot search can not satisfy neither row, column, lexicographic 
or reverse lexicographic orders. Now, if the pivots selected in the elimination 
of Ai minimizes the product order, then they necessarily also respect this or¬ 
der as pivots of the whole matrix A. Now, from (a), the remaining matrix H 

-k^{p-k) 12 elimination is done by two independent 

£121 1122 ] 

eliminations on the blocks H 12 and H 21 , followed by some update of H 22 and 
a last elimination on it. Here again, pivots minimizing the row order on H 21 
and Hi 2 are also pivots minimizing this order for H, and so are those of the 
fourth elimination. Now the block row and column permutations used in [5, 
Algorithm 1] to form the PLUQ decomposition are r-monotonically increasing. 
Hence, from case (d), the algorithm computes the rank profile matrix and pre¬ 
serves the monotonicity. If only one of the row or column permutations are 
rotations, then case (b) or (c) applies to show that either the row or the column 
rank profile is computed. 


6 Rank profile matrix based triangularizations 

6.1 LEU decomposition 

The LEU decomposition introduced in [12] involves a lower triangular matrix 
L, an upper triangular matrix U and a r-sub-permutation matrix E. 

Theorem 3. Let A = PLUQ he a PLUQ decomposition revealing the rank 
profile matrix (Hp.Q = TZa)- Then an LEU decomposition of A with E = TZa 
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is obtained as follows (only using row and column permutations): 


A = P [L Omx{n-r)] P 


QQ^ 


U 

0(n—r) xn 
U 


Q 


( 7 ) 


Proof. First E = P 


0 


Q = IIp^Q = TZa- Then there only needs to show that 


L is lower triangular and U is upper triangular. Suppose that L is not lower 
triangular, let i be the first row index such that Lij ^ 0 for some i < j. First 
j S RowRP(A) since the non-zero columns in L are placed according to the 

U 


0 E-r 


Q, and since 


first r values of P. Remarking that A = P [L Omx{n-r)] 

right multiplication by a non-singular matrix does not change row rank profiles, 
we deduce that RowRP(npQ) = RowRP(A) = RowRP(L). If i ^ RowRP(^), 
then the t-th row of L is linearly dependent with the previous rows, but none 
of them has a non-zero element in column j > i. Hence i S RowRP(H). 

Let {a,b) be the position of the coefficient Lij in L, that is a = (Tp^(i), b = 
ap^{j). Let also s = crgia) and t = aQ{b) so that the pivots at diagonal position 
a and b in L respectively correspond to ones in TZa at positions (i, s) and (j, t). 
Consider the ixp leading sub-matrices Ai of A where i = Tnayix=i..a-i{<yp{x)) 
and p = maxa;=i.,o_i(crQ(a;)). On one hand (j, f) is an index position in Ai but 
not (i,s), since otherwise rank(y4i) = b. Therefore, (z, s) -fiprod {jA)^ and s>t 
as z < j. As coefficients (j, t) and (z, s) are pivots in TZa and i < j and t < s, 
there can not be a non-zero element above (j, t) at row i when it is chosen as a 
pivot. Hence Lij = 0 and L is lower triangular. The same reasoning applies to 
show that U is upper triangular. □ 

Remark 3. Note that the LEU decomposition with E — TZa is not unique, even 
for invertible matrices. As a counter-example, the following decomposition holds 
for any a € K.- 

( 8 ) 


0 1' 


'1 o' 


0 1 


1 —a 

1 0 


a 1 


1 0 


0 1 


6.2 Bruhat decomposition 

The Bruhat decomposition, that has inspired Malaschonok’s LEU decomposi¬ 
tion [12], is another decomposition with a central permutation matrix [1, 7]. 

Theorem 4 ([!]). Any invertible matrix A can be written as A = VPU for V 
and U uppper triangular invertible matrices and P a permutation matrix. The 
latter decomposition is called the Bruhat decomposition of A. 

It was then naturally extended to singular square matrices by [7]. Corollary 1 
generalizes it to matrices with arbitrary dimensions, and relates it to the PLUQ 
decomposition. 
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Corollary 1. Any mxn matrix of rank r has a VPU decomposition, where V 
and U are upper triangular matrices, and P is a r-sub-permutation matrix. 

Proof. Let Jn be the unit anti-diagonal matrix. From the LEU decomposition 
of JnA, we have A = JnLJn JnE U where V is upper triangular. □ 

V P 


6.3 Relation to LUP and PLU decompositions 

The LUP decomposition A = LUP only exists for matrices with generic row 
rank profile (including matrices with full row rank). Corollary 2 shows upon 
which condition the permutation matrix P equals the rank profile matrix TZa- 
Note that although the rank profile A is trivial in such cases, the matrix TZa 
still carries important information on the row and column rank profiles of all 
leading sub-matrices of A. 


Corollary 2. Let A he an m x n matrix. 

If A has generic column rank profile, then any PLU decomposition A = PLU 
computed using reverse lexicographic order search and row rotations is such that 


TZa = P 



In particular, P 


TZa if r = m. 


If A has generic row rank profile, then any LUP decomposition A = LUP 
computed using lexicographic order search and column rotations is such that 


TZa 



P. In particular, P = TZa if r = n. 


Proof. Consider A has generic column rank profile. From table 1, any PLUQ 
decomposition algorithm with a reverse lexicographic order based search and 


rotation based row permutation is such that flp^g = P 


Q = TZa. Since the 


search follows the reverse lexicographic order and the matrix has generic column 
rank profile, no column will be permuted in this elimination, and therefore 
Q = In- The same reasoning hold for when A has generic row rank profile. □ 


Note that the L and U factors in a PLU decomposition are uniquely de¬ 
termined by the permutation P. Hence, when the matrix has full row rank, 
P = TZa and the decomposition A = IZaI^U is unique. Similarly the decompo¬ 
sition A = LUTZ A is unique when the matrix has full column rank. Now when 
the matrix is rank deficient with generic row rank profile, there is no longer a 
unique PLU decomposition revealing the rank profile matrix: any permutation 
applied to the last m — r columns of P and the last m — r rows of L yields a 

PLU decomposition where TZa = P . 


Lastly, we remark that the only situation where the rank profile matrix TZa 
can be read directly as a sub-matrix of P or Q is as in corollary 2, when the 
matrix A has generic row or column rank profile. Consider a PLUQ decom¬ 


position A = PLUQ revealing the rank profile matrix {TZa = P 


’’ Q) such 
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that TZa is a sub-matrix of P. This means that P = TZa + S where S has 
disjoint row and column support with TZa- We have TZa = {TZa + S) 


{TIa + S) 


Qi 

0(n—r) X n 


Hence TZA{In — 


Qi 

b(n—r) X n 


) = 5 


row support of these matrices are disjoint, hence TZa 


0 


Q = 

but the 


Qi 

b(n—r) X n 

= 0 which implies 


that A has generic column rank profile. Similarly, one shows that TZa can be a 
sub-matrix of Q only if A has a generic row rank profile. 


7 Improvements in practice 

In our previous contribution [5], we identified the ability to recover the rank 
profile matrix via the use of the product order search and of rotations. Hence 
we proposed an implementation combining a tile recursive algorithm and an 
iterative base case, using these search and permutation strategies. 

The analysis of sections 4 and 5 shows that other pivoting strategies can be 
used to compute the rank profile matrix, and preserve the monotonicity. We 
present here a new base case algorithm and its implementation over a finite 
field that we wrote in the FFLAS-FFPACK library^. It is based on a lexicographic 
order search and row and column rotations. Moreover, the schedule of the 
update operations is that of a Grout elimination, for it reduces the number of 
modular reductions, as shown in [3, § 3.1]. Algorithm 1 summarizes this variant. 


Algorithm 1 Grout variant of PLUQ with lexicographic search and column 
rotations_ 

1: fc ^ 1 

2: for i = 1.. .m do 

3: i Ai j^ ji Ai i ]^—i X Ai A—l,k..n 

4 : if Ai^k..n = 0 then 

5: Loop to next iteration 

6: end if 

7 : Let Ai^s be the left-most non-zero element of row i. 

8: ^ s i 5 A-i^i jji i A — 1 ^ Ai A:—I s 

9 * A^-^l_-fyis I Ai^s 

10 : Bring to by column rotation 

11: Bring * to Aj, * by row rotation 

12: k ^ k+l 

13: end for 


^FFLAS-FFPACK revision 1193, http://linalg.org/projects/fflas-ffpack, linked 
against OpenBLAS-vO.2.8. 
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PLUQ base cases mod 131071. Rank = n/2. on a i5-3320 at 2.6GHz 



PLUQ base cases mod 131071. Rank = n/2. on an i5-3320 at 2.6GHz 



Figure 1: Computation speed of PLUQ decomposition base cases. 

In the following experiments, we measured the real time of the computation 
averaged over 10 instances (100 for n < 500) of n x n matrices with rank 
r = n/2 for any even integer value of n between 20 and 700. In order to ensure 
that the row and column rank profiles of these matrices are uniformly random, 
we construct them as the product A = LTZU, where L and U are random 
non-singular lower and upper triangular matrices and TZ is an m x n r-sub- 
permutation matrix whose non-zero elements positions are chosen uniformly 
at random. The effective speed is obtained by dividing an estimate of the 
arithmetic cost (2mnr + 2/3r^ — r^(m -I- n)) by the computation time. 
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Figure 1 shows its computation speed (3), compared to that of the pure 
recursive algorithm (6), and to our previous base case algorithm [5], using a 
product order search, and either a left-looking (4) or a right-looking (5) schedule. 
At n = 200, the left-looking variant (4) improves over the right looking variant 
(5) by a factor of about 2.14 as it performs fewer modular reductions. Then, the 
Grout variant (3) again improves variant (4) by a factor of about 3.15. Lastly 
we also show the speed of the final implementation, formed by the tile recursive 
algorithm cascading to either the Grout base case (1) or the left-looking one (2). 
The threshold where the cascading to the base case occurs is experimentally set 
to its optimum value, i.e. 200 for variant (1) and 70 for variant (2). This 
illustrates that the gain on the base case efficiency leads to a higher threshold, 
and improves the efficiency of the cascade implementation (by an additive gain 
of about 2.2 effective Gfops in the range of dimensions considered). 

8 Computing Echelon forms 

Usual algorithms computing an echelon form [13, 9] use a slab block decomposi¬ 
tion (with row or lexicographic order search), which implies that pivots appear 
in the order of the echelon form. The column echelon form is simply obtained 
as C = PL from the PLUQ decomposition. Using product order search, this 
is no longer true, and the order of the columns in L may not be that of the 
echelon form. Algorithm 2 shows how to recover the echelon form in such cases. 
Note that both the row and the column echelon forms can thus be computed 


Algorithm 2 Echelon form from a PLUQ decomposition 
Input: P, L, U, Q, a PLUQ decomp, of A with TZa = 
Output: C: the column echelon form of A 
1 : C ^ PL 

2: (pi, ..,Pr) = Sort((Tp(l),.., crp(r)) 

3: for i = l..r do 

4: T = (crp^(pi),..,crp^(pr),r-k 1 ,..,to) 

5: end for 

6: C ^ CPr 


from the same PLUQ decomposition. Lastly, the column echelon form of the 
ix j leading sub-matrix, is computed by removing rows of PL below index i and 
filtering out the pivots of column index greater than j. The latter is achieved 
by replacing line 2 by (pi, ..,ps) = Sort({(Tp(f) : ctq(j) < j}). 
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